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Abstract 

This paper proposes a new method to reverse engineer gene regulatory networks 
from experimental data. The modeling framework used is time-discrete determin- 
istic dynamical systems, with a finite set of states for each of the variables. The 
simplest examples of such models are Boolean networks, in which variables have 
only two possible states. The use of a larger number of possible states allows a finer 
discretization of experimental data and more than one possible mode of action for 
the variables, depending on threshold values. Furthermore, with a suitable choice of 
state set, one can employ powerful tools from computational algebra, that underlie 
the reverse-engineering algorithm, avoiding costly enumeration strategies. To per- 
form well, the algorithm requires wildtype together with perturbation time courses. 
This makes it suitable for small to meso-scale networks rather than networks on 
a genome-wide scale. The complexity of the algorithm is quadratic in the number 
of variables and cubic in the number of time points. The algorithm is validated on 
a recently published Boolean network model of segment polarity development in 
Drosophila melanog aster. 
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1 Introduction 



Since the advent of DNA microarray technology several techniques from math- 
ematics, statistics, engineering, and computer science have been adapted or 
newly developed for the purpose of using microarray and other data to reverse 
engineer the structure of gene regulatory networks. An important goal of sys- 
tems biology is to discover and model the causal relationships between the 
components of such networks and the mechanisms that govern their dynamics 
(Kitano, 2002). Existing reverse-engineering methods can be broadly catego- 
rized as continuous vs. discrete and deterministic vs. stochastic. Some methods 
aim to discover only the network topology, that is, which genes regulate which 
others, with a directed graph or "wiring diagram" as output, possibly signed 
to indicate activation or inhibition. The goal of other methods is to describe 
the dynamics of the network. We first describe some of these methods in order 
to place the one proposed in this paper into context. 

Several methods have been proposed that reconstruct only the wiring diagram 
of the network. Bayesian network methods, first proposed in (Friedman et al, 
2000), and further developed in (Hartemink et al, 2001), can be apphed to 
single time points. This approach was applied in (Hartemink et ai, 2001) to 
data from 52 Saccharomyces cerevisiae Affymetrix chips, in order to analyze 
and score models of the gene regulatory network responsible for the control of 
genes necessary for galactose metabolism. The network considered involved 7 
nodes. In (Filkov et al, 2002) another statistical method was proposed that 
compares expression profiles from a time series of measurements. As an appli- 
cation the authors analyzed the data sets on S. cerevisiae in (Cho et ai, 1998) 
and (Spellman et al, 1998), which consist of four microarray time series. 

A method adapted from metabolic control analysis was proposed in (de la 
Fuente et al, 2002). It requires input data based on very small perturbations 
of the rate of expression of each gene. The method was applied to simulated 
data using a model, published in (Mendoza et al, 1999), of the gene network 
that controls flower morphogenesis in Arabidopsis thaliana, involving 10 genes. 
Using 10% perturbations, the authors were able to completely reconstruct 
the network of regulatory relations. While this survey of methods to reverse 
engineer gene networks in the form of a wiring diagram is far from complete, it 
provides a flavor of the variety of approaches, the scale at which the methods 
are being applied, and the challenges that result from a lack of appropriate 
experimental data. 

The most common approach to the modeling of dynamics is to view a gene reg- 
ulatory network as a biochemical network of gene products, typically mRNA 
and proteins, and to describe their rates of change through a system of ordi- 
nary differential equations. Thus, the modeling framework is that of a time- 



2 



continuous, deterministic dynamical system. We describe in detail the method 
in (Yeung et ai, 2002), as it shares many features with our method, even 
though it uses a different modeling framework. According to the authors the 
method is intended to generate a "first draft of the topology of the entire 
network, on which further, more local, analysis can be based." The authors 
make two assumptions. 

Firstly, the system is assumed to be operating near a steady state, so that 
the dynamics can be approximated by a linear system of ordinary differential 
equations: 

dx- ^ 
at -^^ 

for i = 1, . . . ,N. Here, Xi, . . . ,xi^ are mRNA concentrations, the Aj are the 
self-degradation rates, the 6j are the external stimuli, and the C,i represent 
noise. The (unknown) Wij describe the type and strength of the influence of 
the jth gene on the ith gene. They assemble to a square matrix W of real num- 
bers. The output of the reverse-engineering algorithm is this matrix W. The 
input is a series of data points obtained by applying the stimulus (61, ... , &jv)"^ 
and measuring the concentrations Xi, . . . , xjy M times. Assembling these mea- 
surements into a matrix X, neglecting noise, and absorbing self-degradation 
into the coupling constants Wij, we obtain a matrix equation 

-f-X = WX + B. 

at 

Here, X is an (A^ x M)-matrix, an (A^ x A^)-matrix, and i? an (A^ x M)- 
matrix. Using singular value decomposition (SVD) one obtains 

X^ = UWV^, 

where U and V are orthogonal to each other. The first step is to obtain a 
particular solution Wq to the reverse-engineering problem. One then obtains 
all possible solutions to the problem as 

W ^Wo + CV^, 

where C ranges over the space of all square {N x A")-matrices whose entries 
are equal to for a certain range of j and arbitrary otherwise. Equivalently, 
CV^ ranges over all matrices that vanish on the given time points. 

The second assumption made in the paper is that gene regulatory networks are 
sparse. This provides a selection criterion on which to base a particular choice 

of C, and hence of W . That is, the method selects the sparsest connection 
matrix W . This is accomplished through a particular choice of norm, and 
robust regression. The algorithm was validated by way of simulated data from 
three networks. 
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The other end of the model spectrum takes the view of a gene regulatory 
network as a logical switching network. First proposed in (Kauffman, 1969), 
Boolean network models have the advantage of being more intuitive than 
ODE models and might be considered as a coarse-grained approximation to 
the "real" network. They differ from ODE models in that time is taken as 
discrete and gene expression is discretized into only two quantitative states, 
as either present or absent. There is increasing evidence that certain types 
of gene regulatory networks have key features that can be represented well 
through discrete models or hybrid models; see, e.g., (Filkov and Istrail, 2002). 
Several algorithms for reverse engineering of Boolean network models for gene 
expression have been proposed. The algorithm REVEAL (Liang et al, 1998) 
uses as a criterion for model selection the concept of so-called mutual infor- 
mation. For a given experimental data set (assumed to be already discretized 
into a binary data set), that is, a given set of state transitions, the algorithm 
finds a Boolean function for each node of the network that "optimally" de- 
termines the output from the input by using as few variables as possible. In 
essence, the algorithm finds the sparsest possible Boolean network that is con- 
sistent with the data. The search is done by enumeration. In order to make 
this process feasible, the number of inputs to the functions is restricted to at 
most three. The algorithm has been tested by the authors on simulated data 
sets and was found to perform very efficiently. Another algorithm for Boolean 
network identification was given in (Akutsu et ai, 1999). It is in essence also 
an enumeration algorithm, applied to networks in which the in-degree of each 
node is at most 2. 

One of the disadvantages of the Boolean network modeling framework is the 
need to discretize real- valued expression data into an ON/OFF scheme, which 
loses a large amount of information. In response to this deficiency, multi-state 
discrete models and hybrid models have been developed. The most complex 
one (Thomas, 1991; Thieffry et al, 1995; Thieffry and Thomas, 1998) uses 
multiple states for the genes in the network corresponding to certain thresh- 
olds of gene expression that make multiple gene actions possible. The authors 
are most interested in the modeling and function of feedback loops. The model 
includes a mixture of multi-valued logical and real- valued variables, as well as 
the possibility of asynchronous updating of the variables. While this modehng 
framework is capable of better capturing the many characteristics of gene reg- 
ulatory networks than Boolean networks, it also introduces substantially more 
computational complications from a reverse- engineering point of view, espe- 
cially the ability of asynchronous update, which adds orders of magnitude to 
the combinatorial explosion of possibilities. Multiple discrete expression levels 
were also used in the reverse-engineering method in (Repsilber et ai, 2002), 
which uses genetic algorithms to explore the parameter space of multistage 
discrete genetic network models. 

In (Brazma and Schlitt, 2003) a hybrid modeling framework was introduced 



4 



that tries to capture discrete as well as continuous aspects of gene regulation. 
The authors' finite state linear model has a Boolean network type control com- 
ponent, as well as linear functions that represent an environment of substances 
that change their concentrations continuously. According to the authors, this 
framework can be generalized to more than two states for the logical variables. 
The reverse-engineering algorithm described in (Brazma and Schlitt, 2003) is 
based essentially on enumeration of all possible functions that fit a given data 
set. 

Finally, it is worth mentioning a Boolean network approach that incorporates 
stochastic features of gene regulation. Probabilistic Boolean networks have 
been introduced in (Shmulevich et al., 2002). While this survey of existing 

reverse-engineering methods is by no means comprehensive, it provides a con- 
text for the new method proposed in this paper. For more thorough reviews 
the reader can consult, e.g., (Bower and Bolouri, 2001) or (de Jong, 2002). 

While continuous and discrete modeling approaches seem to be far apart, it is 
useful to keep in mind that most ODE models cannot be solved analytically 
and that numerical solutions of such systems involve the approximation of the 
time-continuous system by a time-discrete one. Furthermore, when validating 
an ODE model using microarray data it is often necessary to utilize thresholds 
for continuous concentrations. The connection between an ODE system and 
an associated discrete system that captures information about the continuous 
dynamics has been formalized in (Lewis and Glass, 1991). So, in the end, the 
two modeling paradigms might not be as far apart as it appears. 

The modeling framework we describe in the next section is of the discrete 
multi-state variety. We propose here an approach that describes a regulatory 
network as a time-discrete multi-state dynamical system, synchronously up- 
dated. We further make the additional assumption that the set of states of 
the nodes in the network can be endowed with the algebraic structure of a 
finite number system, which allows us to use techniques and algorithms from 
computational algebra. As mentioned earlier, it is difficult for any reverse- 
engineering method to be validated using a real system as a test case, firstly 
because a suitable data set might not be available and, secondly, model pre- 
dictions might be hard to verify without extra experiments. For this reason, 
almost all methods discussed above have used simulated data for validation. 
We have chosen to use a recent Boolean network model for segment polarity 
development in Drosophila melanogaster (Albert and Othmer, 2003), consist- 
ing of sixteen nodes per cell, representing genes and gene products. The model 
is sufficiently complex to be of interest, but small enough so we can compare 
in detail the agreements and disagreements of our reverse-engineered network 
with the real one, thereby illustrating the performance and limitations of our 
method. 
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2 Reverse Engineering of Polynomial Systems over Finite Fields 

As mentioned above, we adopt the modeling framework of time-discrete multi- 
state dynamical systems. Let X be the set of possible states of the nodes of the 
network, and assume that X is a finite (but possibly large) set. One should 
think of X as a set of discretized expression levels, for instance. (We will 
address the issue of data discretization in a later section.) Let 

/ : X" — ^ X" 

be a discrete dynamical system over X of dimension n. Then / can be de- 
scribed in terms of its coordinate functions /j : X" — ^ X, ior i — 1, . . . ,n. 
That is, if X = (xi, . . . , x„) e X" is a state, then /(x) = (/i(x), . . . , /n(x)). 
We refer to such systems as finite dynamical systems. 

The reverse-engineering problem we are focusing on is one of model selection 
and can be stated as follows. 

Given one or more time series of state transitions, generated by a biological 
system with n varying quantities, choose a finite dynamical system / : X"- — > 
X", which fits the data and "best describes" the biological system. To be 
precise, we presume that we are given a set of state transitions of the network, 
in the form of one or more time series. That is, we are given sequences of 
states 

— (^11) ^21) ■ ■ ■ ) ^nl)i ■ ■ ■ 1 — {^Imi ■ ■ ■ i ^nm) 
tl = (^11, t21, . . . , tnl), ... ,tr — {tin ■ ■ ■ i tnr) 

These satisfy the property that, if the unknown transition function of the 
network is /, then 

/(Sj) = Si+i, i = 1, . . . ,m - 1 

Typically, there will be more than one possible choice. In fact, unless all state 
transitions of the system are specified, there will be more than one network 
that fits the given data set. Since this much information is hardly ever available 
in practice, any reverse-engineering method has to choose from a large set 
of possible network models. In the absence of a good understanding of the 
properties and characteristics of gene regulatory networks, one is limited to 
some type of Occam's razor principle for model selection. Before describing 
the principle we use, we first need to describe our computational framework. 
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We now make the further assumption that our state set X is chosen so that it 
can be given the structure of a finite field (Lidl and Niederreiter, 1997), that 
is, a finite number system. This is possible whenever the number of elements 
in X is a power of a prime number. This can be easily accomplished by an 
appropriate choice of data discretization. One possible approach is to choose 
a prime number p of possible states, in which case the number system can 
be taken to be Z/p, the integers modulo p. An important consequence of this 
assumption is the well-known fact (see, e.g., (Lidl and Niederreiter, 1997), 
p. 369) that each of the coordinate functions of / can be expressed as a 
polynomial function in n variables, with coefficients in X, and so that the 
degree of each variable is at most equal to the number of elements in X. 

Example. Boolean functions can be represented as polynomial functions with 
coefficients in Z/2 = 0, 1. Indeed, if x and y are Boolean variables, then 

X f\y — X ■ y,x\/ y — X -\- y -\- X ■ y,->x — X -\- 1. 

Note that addition does not correspond to the logical OR function, but rather 
the exclusive OR function, XOR. Since every Boolean function can be written 
in terms of these three Boolean operations, we sec that every Boolean function 
can be represented as a polynomial function on the field with two elements. 
Conversely, in hght of the above fact, we see that any function (2/2)" — > 
(Z/2)" can be realized as a Boolean network. 

So assume now that our state set is a finite field, denoted k. As observed above, 
the model / we are searching for is determined by its coordinate functions 
fi : — > k. We can reverse engineer each coordinate function independently 
and thus reconstruct the network one node at a time. Our algorithm is very 
similar in outline to that in (Yeung et ai, 2002). We first compute the space of 
all networks that are consistent with the given time series data. We then choose 
a particular network / = (/i, . . . , /„) that satisfies the following property: 

Criterion for model selection. For each i, fi is minimal in the sense that 
there is no non-zero polynomial g G k[xi, . . . , x^] such that f = h + g and g 
is identically equal to zero on the given time points. 

That is, we exclude terms in the polynomials fi that vanish identically on the 
data. No reverse-engineering method is able to detect such terms without prior 
information, even though they may exist in the real network but be nonfunc- 
tional under the particular experimental conditions. Note that this criterion is 
different from that used in (Yeung et al., 2002) and also by REVEAL. In both 
of those algorithms the search is for the sparsest network, that is, a network 
such that each node takes inputs from as few variables as possible. 

In terms of the coordinate functions, the basic mathematical reverse-engineering 
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problem is then as follows, formulated for one time series. The statement of 
the problem for several time series is similar. 

Problem. Suppose we are given a collection of states Si = (sn, . . . , s„i) , . . . , = 
{sim, ■ ■ ■ , Snm), ^ choice of Coordinate i e {1, . . . , n}. 

(1) Find all polynomial functions fi G k[xi, . . . , Xn] such that 

That is, find all polynomials that map to the ith coordinate of the 
next time point Sj+i. 

(2) From that set of functions choose one that does not contain any terms 
that arc identically equal to zero at all time points (which is unique; see 
Appendix A). 

The key advantage of the polynomial modeling framework over a finite field is 
that there is a well-developed algorithmic theory that provides mathematical 
tools for the solution of this problem which have polynomial time complexity in 
the input. Thus, we can overcome one disadvantage that discrete models have 
compared to ODE models, for which there is a well developed mathematical 
theory. This feature has been an important reason for the use of polynomial 
systems over finite fields in engineering, in particular control theory (see, e.g., 
(Marchand and Le Borgne, 1998)). We now describe the algorithm. 

We fix one coordinate/node in the network and reverse engineer its transition 
function. To simplify notation, we assume that we have given a time series 
Si, . . . , s„i, as above, together with elements ai, . . . ,am € k, and we are looking 
for all polynomial functions / e k[xi, . . . ,Xn] such that /(sj) = aj for all 
j — 1, . . . ,m. First we compute a particular polynomial /o that fits the data. 
There are several methods to do this, Lagrange interpolation being one of 
them. We use the following formula, based on the so-called Chinese Remainder 
Theorem (see, e.g., (Lang, 1971), p. 63): 

m 

/o(x) = ^ajrj(x), 

where the polynomials rj are defined as follows. Let 1 < i j < m. li Si ^ Sj, 
then find the first coordinate I in which they differ. Define 

bij{x} = {sj^i - Si^if'^ {xi - Si^i) 

for every i j. Then, for i ^ j, 

If Si = Sj, then we have reached a limit cycle. We can restrict the time series to 
the states Si, . . . , Sj_i. We note that rfe(sfc) = 1 and rfc(x) = otherwise. It is 
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straightforward to check that this polynomial function does indeed interpolate 
the given time series. 

Now consider two polynomials f,g& k[xi, . . . , Xn] such that 

/(Sj) = % = ^(Sj)- 

Then (f — g){sj) = for all j. That is, any two such functions differ by a 
function that is identically equal to on the given time series. So, in order to 
find all functions that fit the data, we need to find all functions that vanish on 
the given time points. Note that the set of all such functions is closed under 
addition and multiplication by any polynomial function in k[xi, . . . ,Xn], and 
so forms a so-called ideal I in the set of all polynomials. In Appendix A we 
describe the details of an algorithm to compute generators for I (similar to a 
basis for a vector space). The algorithm computes polynomials gi, . . . ,gr G / 
such that any / G / can be expressed as a linear combination 

r 
i=l 

for some polynomials hi G k[xi, . . . ,Xn]- The next step is to reduce the poly- 
nomial /o found in the previous step modulo the ideal /, that is, write /o 
as 

fo^f + g, 

with g ^ I. Furthermore, / is minimal in the sense that it cannot be further 
decomposed into f + h with /i G /. In other words, g represents the part of 
/o that lies in /, and that therefore is identically equal to on the given time 
series. Then we obtain all possible functions that interpolate the time series 
in the form f + g, where g runs through all elements of /. We summarize the 
different steps. 

Reverse-engineering algorithm. (For one node of the network.) 

Input: A time series Si, . . . , G /c" of network states, together with expres- 
sion levels Qi, . . . ,am & k. 

Output: A polynomial function / G k[xi, . . . ,Xn] such that /(sj) = aj and 
such that / does not contain component polynomials that vanish on the time 
series. 

(1) Compute a particular solution /o from the formula. 

(2) Compute the ideal / of all functions that vanish on the data. 

(3) Compute the reduction / of /o with respect to /. 

Some comments on the features of the algorithm and its complexity are in or- 
der. Step (1) is straightforward. Step (2) is where the bulk of the computation 
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takes place. The first observation is that the ideal / can be computed as the 
intersection of the ideals 



for J = 1, .... m. Intersections of ideals can be computed algorithmically. See, 
e.g., (Cox et al, 1997), p. 185, for details. The computation relies on the 
computation of a so-called Grobner basis for several intermediate ideals. It is 
known that the worst-case complexity of Grobner basis calculations is dou- 
bly exponential. In the case of finite fields, however, the calculations needed 
can be reduced essentially to linear algebra. Our algorithm is very similar to 
one used in algebraic statistics, for the selection of models for factorial de- 
sign problems (Robbiano, 1998). The only difference is that our algorithm 
computes /o and / separately, whereas the algorithm in (Robbiano, 1998) 
combines the two. (Since / is the same for all nodes, we only need to com- 
pute it once if done separately.) It is shown there (Theorem 3.1) that the 
algorithm is quadratic in the number of variables and cubic in the number of 
time points. This algorithm is implemented in the computer algebra system 
CoCoA (cocoa. dima. unige.it). For the results in this paper we have used 
the computer algebra system Macaulay2 (www.math.uiuc.edu/Macaulay2), 
which uses the standard algorithms for computation of Grobner bases. We 
emphasize that no portion of this algorithm is based on enumeration. 

As explained in Appendix A, Grobner basis calculations require the choice of 
an ordering of the terms in the polynomials, in particular an ordering of the 
variables. This is necessary in order to carry out long division of multivariate 
polynomials. The end result of the calculations depends on the term ordering 
chosen, in the sense that "cheaper" variables, that is, those that are smallest 
in the ordering, will be used preferentially in both the interpolation as well 
as the reduction calculations. With prior information about the existence of 
certain regulatory relationships this feature can be used to incorporate this 
prior knowledge into the algorithm. (Likewise, using the Elimination Theorem 
(see Appendix A), one can exclude certain variables from appearing.) In gen- 
eral, however, this dependence makes the interpretation of the precise form 
of the functions difficult at times. In those cases we resort to building a con- 
sensus model using a collection of variable orderings that makes each variable 
"equally costly on average." We can then extract from these different models 
common regulatory links and common nonlinear terms. In the next section we 
illustrate this strategy with a detailed discussion of an example. 
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3 An Application 

In this section, we present an application to the well-characterized network 
of segment polarity genes in the fruit fly D. melanogaster. In (Albert and 
Othmer, 2003) and (von Dassow et ai, 2000) two models were suggested for 
the regulatory interactions in a network of five segment polarity genes and their 
products. In (von Dassow et ai, 2000) the authors proposed a continuous-state 
model using ordinary differential equations. In contrast, the Albert-Othmer 
model is discrete, in that the functions governing state transitions are Boolean 
functions. Both models were built from inferences given gene and protein 
expression data. Figure 1 depicts the graph of connections in the Boolean 
model in (Albert and Othmer, 2003). In the graph, nodes represent mRNAs 
and proteins. An edge between nodes indicates that the node at the head 
is involved in the regulation of the tail node. For example, an edge A^B 
between protein nodes A and B implies that A regulates the transcription and 
translation of B, whereas an edge A— from protein A to mRNA b implies that 
A regulates the transcription of gene b. Note that edges denote the existence 
of regulation, not the type, whether activation or inhibition. In Table 1 which 
follows are some of the Boolean functions that accompany the model in Figure 
1. Superscripts denote time and subscripts location relative to the current cell. 




Neighbor Cell Neighbor 



Fig. 1. The graph of interactions in the Boolean network developed in (Albert and 
Othmer, 2003). Ovals = mRNAs, rectangles = proteins. SLP denotes 2 forkhead 
domain transcription factors encoded by the gene sloppy paired and is believed to 
activate the segment polarity genes depicted in the model (Cadigan et ai, 1994). 
PH is the protein complex formed by the binding of HH to PTC (Ingham and 
McMahon, 2001). The protein SMO is encoded by the gene smoothened. Because its 
transcription is not regulated by any molecular species in the model, smoothened is 
not represented. 



11 



k 


= n'l^i = 






= HHl+^ 


= hh\ 


h 


= Vtc{^^ = 




/9 


= PTCl+^ 


= pt4 V {PTCl A -^HHU A ^HHl^,) 



Table 1 

Sample Boolean functions for the network in Figure 1. 

The genes that are being modeled are wingless (wg), engrailed (en), hedgehog 
(hh), patched (ptc), and cubitus interruptus (ci). Our goal here is to reverse 
engineer the Boolean network A/" in Figure 1 from time series generated by it, 
including null mutant time series for each of the five genes in the network. 

The network consists of a ring of 12 interconnected cells, which are grouped 
into 3 parasegment primordia and the genes are expressed every fourth cell. 
Since each cell of the ring has the same network of segment polarity genes, we 
focus our work on the reconstruction of the network in one cell, taking into 
account intercellular connections. For more details, sec (Albert and Othmer, 
2003; von Dassow et al., 2000). Note that for our purposes it is irrelevant 
whether the model is indeed correct. 

The starting point for our reverse- engineering algorithm is a collection of time 
series of discrete expression data. We used known configurations of states 
(Cadigan et al., 1994; Hooper and Scott, 1992) for the 5 genes as initializa- 
tions and generated times series for the wildtypes using the Boolean functions. 
All of the initializations terminate in steady states when evaluated by the 
Boolean functions in Table B.l in Appendix B. To account for intercellular 
dependencies we included 6 extra nodes into our model. Table 2 contains the 
translations of the Boolean functions in Table 1 into polynomial functions in 
the variables Xi, . . . , 0:21, with coefficients in Z/2. (See Table B.2 in Appendix 
B for all other polynomials.) 

/e = X5 (xi5 + 1) 
h = XQ 

fs = xi3(X + a;2i + Xx2i)(a;4 + l)(a;i3(xii + l)(a;20 + l)(a;2i + l) + l) 
X := {xii + X20 + a;iiX2o) 

/g = X8 + Xg {xis + 1) (xig + 1) + XsXg (xi8 + 1) (xig + 1) 

Table 2 

Polynomial representations of the sample Boolean functions in Table 1. See Ap- 
pendix B for the legend of variable names. To account for simultaneous updating 
we substituted simultaneously updated terms with their function expressions (e.g. 
in fs we replaced EN with en). 

The size of the state space of the Boolean network in each cell is 2^^, involving 
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multiple components. Any single trajectory in that space vastly underdeter- 
mines the network. If we use no known biological information other than the 
expression data of the wildtype to construct a discrete polynomial model, our 
method predicts the network to have 20 links, of which 14 are in the network 
( jV has 44 links) . (A description of the method with which we construct the 
consensus model is given below.) The performance of the algorithm dramat- 
ically improves, however, if we incorporate knock-out time series for the five 
genes, which we demonstrate below. 

To simulate an experiment in which node Xi representing a gene is knocked out, 
we set its corresponding update function fi in Table B.2 to (and kept all other 
functions the same). We also set the corresponding functions in neighboring 
cells equal to 0. We then generated a new time series by iteration of the given 
initiahzations. With these data, we constructed the model in Figure 2. 

Recall that we first construct all polynomial models (sets of polynomial func- 
tions) which fit a discrete time series. Then we select the one which is minimal 
with respect to summands in each of the functions which evaluate to on all 
time points. The algorithm relies on the choice of a total ordering for all 
possible monomials in the given variables, in particular, a total ordering of 
the variables themselves. The effect of such a variable ordering is that the 
"cheaper" variables, those that are smaller in this ordering, are used prefer- 
entially in the algorithm. In order to counteract this dependency, we use a 
consensus model extracted from four different choices of variable ordering: the 
ordering xi < ■ ■ ■ < X21, the reverse ordering xi > • ■ • > X21, and 2 orderings 
which make "interior" variables greatest and least. Using each ordering we ran 
the algorithm and intersected them to obtain a consensus model (see Figure 
2). 

The consensus model we construct, which incorporates data from the wild- 
type and knock-out mutants, results in prediction of 37 of the 44 links in the 
Boolean network. We correctly identified the species which regulate the tran- 
scription of genes wg and ci and the synthesis of proteins SLP, WG, EN, HH, 
PTC, and CI. We summarize the model of interactions and compare it to the 
model in Figure 1. In this model we assume that transcription and translation 
occur in 1 consecutive time step. We will discuss reconstruction of certain 
dynamic properties at the end of the section. 

In determining which species affect the transcription of the gene hh, we found 
a function that involves fewer terms, than the polynomial representation of its 
counterpart in the Boolean model. Specifically, the function is in terms of the 
gene product EN. The function satisfies all time series, including knock-out 
data, generated by its associated Boolean function. The discrepancy lies in 
the existence of a term in the Boolean function which does not contribute to 
the updating of hh, that is, a term that is constant equal to on all input 
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Fig. 2. The graph of the consensus model, constructed from the wildtype and 
knock-out time series. The graph is the intersection of 4 polynomial models, one for 
each variable ordering used. Dashed lines are links that appear in 3 of the 4 models. 

time points. However, links whose effects are not reflected in the given data 
are not detectable by any reverse-engineering method unless prior information 
about the link is given. Similarly, the Boolean function for en also contains 
such terms, which accounts for differences in regulation dependencies. We 
infer that transcription of the gene en is regulated by Wingless protein in an 
anterior cell. 



In this model, transcription of the patched gene to synthesize ptc mRNA in a 
given cell is regulated by the following molecular species: the proteins PTC, 
PH, SMO, CI and WG in the cell anterior, as well as hh from neighboring 
cells. We were not able to conclude that en regulates the transcription of the 
gene. It is worth observing that this regulatory link is not supported in the 
von Dassow model. The Albert-Othmer model predicts that transcription of 
the ptc gene is regulated only by the whole CI protein and its repressor form 
CIR. In fact, the authors explain that the cleavage of CI creates the repressor 
fragment CIR, whose synthesis is repressed by hh product in neighboring 
cells. This effect may be too indirect to be recovered from the data. Our 
identification of WG protein in adjacent cells as a regulator and our failure 
to detect activation or repression by engrailed (whose transcription depends 
on WG) suggest that WG may have a more direct effect on the regulation of 
ptc. Further, it is possible that existence of regulation by en is inconclusive, 
given the data, especially in light of the discrepancy between the Boolean and 
continuous models. Currently we have no evidence to support regulation by 
PTC or by PH. 

For the protein complex PH, we detected that its synthesis is regulated itself 
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and by ptc, hh and their corresponding products in adjacent cells. Recall that 
PH is the molecule formed when HH from adjacent cells binds to the trans- 
membrane receptor PTC. In (Albert and Othmer, 2003) the authors assume 
in their model that this binding occurs instantaneously since it is known that 
the reaction occurs faster than transcription or translation (which they also 
presuppose to require 1 time unit for completion). Therefore, we infer that the 
synthesis of PH at a given time step is determined by the expression levels 
of ptc, hh and hh product, from neighboring cells, in the previous time step. 
However, we did not identify PTC as a direct regulator. As PH may be trans- 
ported into other cells via exocytosis (Taylor et al., 1993), its expression levels 
may not be maintained. This suggests that its presence in a cell signals its 
transport out of the cell, hence implying a type of "autoregulation" . 

We identified the gene ptc, its product, the complex PH, and extracellular hh 
to affect synthesis of SMO. The binding of PTC to SMO inactivates it via post- 
translational modification (Ingham, 1998). However, if PTC binds to HH, then 
SMO is activated (Ingham, 1998). So the translation of smoothened mRNA is 
dependent on the synthesis of the complex PH. (The mRNA smoothened was 
not included in the model since the gene is transcribed ubiquitously through- 
out the segment (Ingham, 1998) and its transcription is not regulated by any 
biochemical in the model.) Since this binding is assumed to occur instanta- 
neously, as described above, we instead detect the regulation by hh in adjacent 
cells and not its product. 

While we correctly identified regulation by SMO, CI, and hh in neighboring 
cells for the activating (CIA) and repressing (CIR) transcriptional factors 
transformed from CI, we predicted two other sources of regulation, namely 
by PTC and PH. There is evidence that PH activates signalling by SMO 
(Chen and Struhl, 1998), causing the transformation of CI into CIA, inhibiting 
cleavage of CI. Further, the rate of cleavage is a function of the amount of free 
PTC in the cell (von Dassow et ai, 2000). The presence of the links from 
PTC and PH to CIA and CIR suggests that their indirect regulation can be 
detected in the data. 

To summarize the performance of our algorithm for detecting network topol- 
ogy, we show that the use of expression levels of wildtype and null mutants 
enables detection of about 84% of the interactions in the Boolean model. In 
contrast, using data only from the wildtype yields a mere 32% identification. 

Next we focus on reverse engineering the dynamics of the Boolean model, that 
is, the Boolean functions on each of the network nodes. As pointed out above, 
the functions in the Boolean model contain terms that evaluate to on all 
input data, so that we are unable to detect the corresponding relationships. 
In order to compare the dynamics predicted by our method with the Boolean 
model we reduce the polynomials in Table B.2 by removing vanishing terms, as 
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described in the previous section. Note, however, that this reduction depends 
on the choice of term ordering, as does the output of our algorithm. For each 
choice of term ordering the reduced functions of the Boolean model and the 
functions reverse engineered from the data agree exactly (see Table 3). This 
shows that we are able to completely predict the Boolean model's dynamics 
from the wildtype and knock-out data. However, due to the sensitivity of 
our method to the chosen term ordering, the particular form of the reverse- 
engineered functions may not be directly interpretable in terms of regulatory 
relationships. We therefore proceed as before by extracting information about 
nonlinear terms containing products of more than one variable common to the 
reverse-engineered functions for multiple term orderings. These are analogous 
to mixed terms in ODE models. 



In the consensus model constructed from the wildtype, all polynomials are lin- 
ear, suggesting that the concentrations of all molecular species in the network 
are regulated by biochemicals working independently. However, Albert et al. 
(2003), von Dassow et al. (2000), and others argue differently, as indicated in 
their models. For a more accurate dynamics, we include expression data for 
the null mutants. 



In the consensus model built with knock-out data, we found the following 
nonlinear interactions. For the proteins SMO and PH, we found that their 
synthesis depends on the "interaction" between the genes ptc and extracel- 
lular hh. Specifically, the terms \ptc\ * [hhi-\\ and \ptc\ * [/i/ij+i] appear in 
the polynomial functions for SMO and PH, for all term orders, which also 
appear in the Boolean functions for these two proteins. In the function de- 
scribing transcription of wg, the terms [tuf?] * [CI A] and *[CIR] are present, 
implying that its expression results from interactions between itself and the 
activator and repressor forms of the protein CI. These products also appear 
in the Boolean function for wg. Furthermore, the function of the protein frag- 
ments of CI include the terms [PTC]*[/i/ij_i] and [PTC]*[/i/ii+i], both of which 
exist in the corresponding Boolean functions. Finally, in any cell, our method 
detects that PTC and hh from a neighboring cell interact, affecting the tran- 
scription of ptc. As mentioned above, we have no evidence to support this 
implication. 



Our method shows a marked improvement when knock-out data are included. 

We were able to reconstruct about 84% of the topology of the Boolean network, 
versus only 32% when knock-out data are not included. Further, we identified 
10 nonlinear interactions, versus none in the model constructed with only 
wildtype data. 
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/e = 

h = 

h = a;i2Xi3 + XisXie + Xi32;20 + X18X20 + 3:13X21 + X19X21 + 2:13 + Xis + Xig 

/g = X10X14 + XuXis + XMXig + XgX20 + a:i82;20 + 2;gX21 + XigX21 + Xs + Xg + Xio 

Table 3 

Sample Boolean functions reduced by the ideal of wildtype and knock-out time 
series. 

4 Data Discretization 

Since gene expression data are real numbers, the first step in any reverse- 
engineering algorithm using discrete models must be to discretize these real 
numbers into a finite (typically small) set of possible states. For Boolean net- 
works this simply amounts to the choice of a single threshold for the expres- 
sion level of each gene, below which a gene is considered inactive. The issue 
of data discretization has been studied extensively, in particular from the 
points of view of Bayesian network applications and machine learning; see, 
e.g., (Dougherty et al, 1995) and (Friedman and Goldszmidt, 1996). Obvi- 
ously, the way one discretizes the data plays an important role in what model 
one obtains. The first important choice is the number of discrete states al- 
lowed. In our case, this is the choice of p in the algorithm. It follows from 
results in (Green, 2003) that for p large enough the result of the reverse- 
engineering algorithm does not depend on p anymore, in the sense that the 
monomials in the polynomials remain the same, possibly with different coeffi- 
cients. While that paper does not give an algorithm for choosing a suitable p, 
extensive experiments with networks simulated with the biochemical network 
simulation program Gepasi (http://www.gepasi.org) suggest that for data 
sets up to 50 nodes an optimal p is in the range between 5 and 13. The ef- 
fect of the choice of p is demonstrated in Figures 3 and 4. This example also 
demonstrates that the availability of more than two discrete states can be very 
helpful for modeling purposes. 



Network of 10 Genes 



1 ■ 




Fig. 3. The graph of the real-valued time series for a network Q of 10 genes. 

There are various ways to attach a discrete label to real-valued data. Thresh- 
olds with biological relevance is one type of labeling that can be used. For 
example, up-regulation, no regulation, and down-regulation of a gene are 
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Fig. 4. The graph of the discrete time series for Q: p = 2 (left) and p = 13 (right) 
with discretization method = global as described above. 

two thresholds that can partition a given data set into 3 groups, with la- 
bels 1, —1,0, respectively. This maps the real-valued data into values in Z/3. 
More thresholds can be integrated to refine the partitioning of the data set. 
Another method of discretization is to normalize the expression of each gene 
or protein and use the deviation from the mean to discretize the data. 



5 Effect of Noise on the Algorithm 



In the previous section, one can see that the choice of prime provides different 
levels of resolution of the real- valued time series, yielding sharp differences in 
the discrete time series. To minimize this effect, a suitably large prime should 
be considered (Green, 2003). However, data collected from experimental pro- 
cesses typically have errors introduced due to deficiencies in methodology or 
imperfections in instrumentation, as is the case in microarray data. Further- 
more, biological systems, such as biochemical pathways, exhibit variability in 
population or concentration levels. For our purposes it is important to quantify 
this noise. 

Because our method begins by discretizing real-valued time series, one would 
expect that discretization will smooth out some of the noise. To test this 
hypothesis, we incorporated random noise, using a normal distribution, into 
time series generated from 100 simulated networks generated by the AGN 
software described in (Mendes et ai, 2003). We generated 25 networks with 
20 nodes for each of the 4 topologies supported by AGN. All time series are 
of length 11 time steps. We added noise in two ways: first, as a percentage of 
each data point, simulating biological variation, and second, as a percentage 
of a fixed value, simulating instrumentation error. In both cases, we added 
10% and 25% noise. We chose 3 different thresholds (0.5, 1.0, and 2.5) with 
which to Booleanize all time series. Then we computed the median number 
of entries that were changed in the noisy time series for both 10% and 25% 
noise, as compared to the original time series. 

The median number of entries that changed for all experiments is 5 ( 2.3% of 
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220). In the case when noise is a percentage of each data point, the median is 
2 ( 1% of 220), when the threshold was restricted to 0.5 or 2.5. For the exper- 
iments described above, we infer that 10% of the noise added is propogated 
to the discrete time series. 

To test for the effect of noise on our method, we added 1% noise to the 
discrete time series for the Boolean functions in (Albert and Othmcr, 2003). 
(We estimate this to correspond to approximately 10% noise in the "real" 
data, given the above results.) We then applied our method to the noisy time 
series and used 8 variable orderings to construct a consensus model. For 6 of 
the nodes (SLP, wg, WG, en, HH, and CI) we were able to identify all of the 
correct links, plus some false positives. For 5 of the nodes {ptc, PTC, SMO, 
CIR, and CIA), we identified half of the correct links. For the remaining nodes, 
we had no conclusive results. 



6 Discussion 

We have described a new method to discover regulatory relationships between 
the nodes in a gene regulatory network from data. The modeling framework 
is that of time-discrete dynamical systems with finite (but possibly large) 
state sets for the variables. The crucial further assumption is that the set 
of possible states for a variable can be given the structure of a finite field. 
As a consequence, we have available to us the very well-developed theory of 
algorithmic polynomial algebra, with a variety of implemented procedures. 
It is this machinery that we employ for the task of reverse engineering. In 
giving up a bit of freedom in the choice of state sets, we gain a mathematical 
framework well suited for reverse-engineering purposes. Our method is very 
similar to that in (Yeung et al, 2002), in that we first compute all possible 
networks that fit the given data. We do this not by enumeration, but rather 
by a procedure similar to describing all solutions of a homogeneous system of 
linear equations by computing a basis for the nullspace. Then, as in (Yeung et 
al., 2002), we choose a particular network. Our selection criterion is to choose 
polynomial functions that do not contain any terms which are identically equal 
to on the given data set. So, we choose a minimal network, not in terms of 
network connections like in (Yeung et al., 2002) and (Liang et al., 1998), but 
rather in terms of the structure of the functions. The rationale is that if a 
term vanishes on the data input, then no reverse-engineering method should 
be able to identify it without prior biological knowledge. 

We contrast our method with the discrete methods described in the introduc- 
tion. In its essence, the Thieffry-Thomas model can also be represented as a 
function from a finite set of states to itself, with dynamics generated by iter- 
ation of this function. The lack of any further mathematical structure in the 
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model makes its analysis very difficult. Moreover, this modeling framework 
does not lend itself to mathematical reverse-engineering methods. Even in the 
much simpler case of Boolean networks, all the algorithms discussed above 
rely at some point on enumeration of a large number of possible components 
of the putative network. 

For method validation we used the Boolean model of the D. melanogaster 
segment polarity network recently published in (Albert and Othmer, 2003). 
We generated time series and perturbed time series from this model and used 
them as input for the algorithm. Note that for our purposes it is irrelevant 
whether the model is in fact an accurate depiction of the real system. Of the 44 
links in the model we correctly identify 37, together with 13 links not present 
in the model. For each of the links that our method identifies incorrectly, we 
provide evidence as to possible reasons. Furthermore, we are able to identify 
many of the key nonlinear relations among variables in the model. 

In the absence of good approximation methods over finite fields, our method 
does exact fitting of data and is consequently quite sensitive to noise. Future 
work includes the development of approximation methods that will alleviate 
some of this sensitivity. A more systematic study of different data discretiza- 
tion methods and their effect on noise reduction will also be carried out. 
Finally, the method will be validated using both simulated networks and pub- 
lished models that generate real-valued data. 
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Appendix 



A Mathematical Background 

In this section we give the mathematical details of the reverse-engineering 
algorithm and some basic facts about computational algebra relevant to this 
paper. The basic problem that lies at the heart of computational algebra is 
that long division of multivariate polynomials differs from that of univariate 
polynomials in that the remainder is not uniquely determined. It depends on 
several choices that need to be made along the way. Univariate division of a 
polynomial / by another one, (7, proceeds by dividing the highest power of 
the variable in g into the highest power in /. For multivariate polynomials 
there are many choices of ordering the terms of a polynomial, which affects 
the outcome of the division. Also, when dividing more than one polynomial 
into a given one, the outcome typically depends on the order in which the 
division is carried out. To be precise, let /c be a field, e.g. the field of real 
or complex numbers, or a finite field, and let /, /i, . . . , G A;[a;i, . . . , a;„] 
be polynomials in the variables Xi, . . . The question whether there are 
polynomials gi, . . . ,gjn G k[xi, . . . , Xn] such that 

m 

f = Y1 9ifi 

i=l 

can in general not be answered algorithmically. In the language of abstract 
algebra, let / = (/i, . . . , fm) be the ideal in k[xi, . . . , x„] generated by the fi, 
that is, / is the set of all linear combinations J^Oifi, with g^ G k[xi, . . . ,Xn]- 
Then we are asking whether / is an element of /. This is known as the ideal 
membership problem. 

In general, the ideal I can be generated by sets of polynomials other than 
the fi, similar to a vector space possessing many different bases, and the so- 
lution to the ideal membership problem does not depend on the choice of a 
particular generating set. It turns out that if one chooses a very special type 
of generating set, known as a Grobner basis, for the ideal, then the ideal mem- 
bership problem becomes solvable algorithmically. There is a basic algorithm 
to compute a Grobner basis for an ideal, the Buchbergcr algorithm. Using this 
algorithm as a foundation, many problems in multivariate computational alge- 
bra can be solved algorithmically, including the computation of intersections 
of ideals, which is the key computation we are using in our algorithm. The cen- 
tral ingredient in this algorithm is the Elimination Theorem (sec (Cox et ai, 
1997), p. 113). This theorem is also the source of the very high computational 
complexity of many algorithms, since it requires the use of a computationally 
expensive type of term ordering. For details see, e.g., (Cox et al, 1997). 
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As described earlier, the basic idea of our algorithm is as follows. First we 
choose a term order for k[xi, . . . ,Xn]. This is necessary for all subsequent 
calculations. To compute the ideal / of all polynomial functions that are iden- 
tically equal to on a given collection {si} of points we proceed as follows. 
Let li be the ideal of all polynomials that take on the value on s^. It is 
straightforward to see that li contains the ideal 

(•^1 "Sjl ) • • • ) 2^1 "Sin) • 

But this ideal is maximal with respect to inclusion, so that it has to be equal 
to li. Then the ideal / is equal to the intersection 

m 
i=l 

Algorithms to compute the intersection of ideals are implemented in many 
specialized computer algebra systems. As mentioned earlier, we have used 
Macaulay2 for all computations. Finally the reduction of the special solution 
/o modulo / is simply the remainder of /o under division by a reduced Grobner 
basis of /. This too is a standard algorithm implemented in most computer 
algebra systems. It is important to observe that, if Qq is another particular 
solution to the interpolation problem, then /o and Qq differ by a polynomial in 
/, as shown before. Therefore, the reduction of /o by / is equal to the reduction 
of qq. Consequently, it does not matter which method we use to construct a 
particular solution. 



B Tables 



25 







, , -, \ it I mod A = 1 or I mod 4 = 2 


fl 


= 


\\ if i mod 4 = 3 or i mod 4 = 


f2 




wg]^ = [CIA\ A SLPl A ^CIR\) V (u;^- A (CI A* V SLPl) A ^CIR\) 


fz 




WG{^ = wgl 


£ 

74 




en- = Gi_i V W/G^+ij A -^SLP^ 


p 

75 




TT" Ar£H-l i 


/6 




= £;Ar/ A ^CIR\ 


/7 




HHl+^ = /i/i^ 


/8 




= c/yl*+^ A ^£;Ar*+i A ^CIRl+^ 


/9 




PTCl+'^ = ptc\ V {PTCl A ^HHl_^ A -^HHj^^) 


/lO 




PHl+^ = PTCl^^ A V 7/^*1^) 


/ll 






/l2 




= -£;iv* 


/l3 




= 


/14 




CM*^^ = C/* A [SMOl V V ^/if+i) 


/l5 




C/ii:*+i = C/| A A A 



Table B.l 

Boolean functions for the network in Figure 1. 
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fl = Xi 

/2 = X1X14 (Xi5 + 1)+X2 {Xi + Xi4 + XiXu) {xi5 + 1) + X1X2X14 (xis + 1)^ (xi + X14 + Xi^u) 

/3 = X2 

/4 = (icie + xi7 + xiexiy) (xi + 1) 

fe = X5 {Xi5 + 1) 

/? = a;6 

/s = a;i3 + a;2o + X11X20) + 3:21 + (a;ii +X20 + 3:11x20) 2:21) (2:4 + 1) 
(a:;i3 (a;ii + 1) (x2o + 1) (x2i + 1) + 1) 

fg = X8 + Xg (xis + 1) (xig + 1) + XsXg (xi8 + 1) (xig + 1) 

/lO = (X8 +X9 (Xi8 + 1) (Xi9 + 1) +X8X9 (Xi8 + 1) (Xi9 + 1)) (X20 + a;2i + X20a:2l) 

fll = Xs + XqY + XsXqY + 1 + X20 + {{X8 + XgY + XsXqY + 1) X20) + X21 
+ (X8 + Xgy + XsXgy + 1 + X20 + (xs + XgY + XgXgY + 1) X20) 3^21 

/l2 = X5 + I 

hi = Xi2 

fu = xi3 ((xii + X20 + X11X20) + 3:21 + (3^11 + X20 + 3;iiX2o) 3:21) 

/l5 = 3;i3 (Xii + 1) {X20 + 1) (X21 + 1) 



SLP 


wg 


WG 


en 


EN 


hh 


HH 


ptc 


PTC 


PH 


SMO 


Xl 


X2 


3^3 


X4 


X5 


Xq 


X7 


Xs 


Xg 


XlO 


Xll 


ci 


CI 


CIA 


CIR 


WG,_i 


WG,+i 


HHj_i 


HHj+i 


hhi^i 


hhi+i 


Y 


X12 


3^13 


Xi4 


3^15 


3^16 


Xi7 


3^18 


3;ig 


X20 


X21 


(Xi8 + 1) (xig + 1) 



Table B.2 

Polynomial representations of the Boolean functions in Table B.l, together with the 
legend of variable names. 
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/l 




Xi 


h 




XiXu + X2X14 + X2X15 + X2 


/3 




^2 


f . 






/5 




X4. 


76 




X5 


/7 




Xq 


/8 




X12XIZ + a;i3Xl6 + X\zX20 + X\sX20 + Xl2,X21 + Xl^X21 + Xl3 + XI8 + Xig 


/9 




a:;ioa;i4 + a;i4a;i8 + 2:142^19 + x^X2q + X18X20 + x^X2i 






+X19X21 + .X8 + .T9 + .Tio 


fio 




a;io2;i4 + xi/^xis + X14X19 + X8X20 + +a;8a^2i + a^io + a;i8 + a^ig 


fll 




a;8X2o + a;9a;20 + xisX2q + X8X21 + a;9a;2i + X19X21 + a;8 + 0:9 + xi8 + X19 + 1 


fl2 




X5 + 1 


fl3 




a:;i2 


fu 




xiixiz + xgX2o + xi8a;2o + 3:9X21 + xi9a;2i + xio + a;i8 + X19 


fl5 




xuxis + X9X20 + a:;i8a;20 + a;9a;2i + 3:19x21 + xio + X13 + Xi8 + X19 



Table B.3 

Boolean functions reduced by the ideal of wildtype and knock-out time series. 
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